L1 Regularized Linear Temporal Difference Learning
نویسندگان
چکیده
Several recent efforts in the field of reinforcement learning have focused attention on the importance of regularization, but the techniques for incorporating regularization into reinforcement learning algorithms, and the effects of these changes upon the convergence of these algorithms, are ongoing areas of research. In particular, little has been written about the use of regularization in online reinforcement learning. In this paper, we describe a novel online stochastic approximation algorithm for reinforcement learning. We prove convergence of the online algorithm and show that the L1 regularized linear fixed point of LARS-TD and LC-TD is an equilibrium fixed point of the algorithm.
منابع مشابه
Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization
The construction of a suitable set of features to approximate value functions is a central problem in reinforcement learning (RL). A popular approach to this problem is to use high-dimensional feature spaces together with least-squares temporal difference learning (LSTD). Although this combination allows for very accurate approximations, it often exhibits poor prediction performance because of ...
متن کامل$\ell_1$ Regularized Gradient Temporal-Difference Learning
In this paper, we study the Temporal Difference (TD) learning with linear value function approximation. It is well known that most TD learning algorithms are unstable with linear function approximation and off-policy learning. Recent development of Gradient TD (GTD) algorithms has addressed this problem successfully. However, the success of GTD algorithms requires a set of well chosen features,...
متن کاملFast Active-set-type Algorithms for L1-regularized Linear Regression
In this paper, we investigate new active-settype methods for l1-regularized linear regression that overcome some difficulties of existing active set methods. By showing a relationship between l1-regularized linear regression and the linear complementarity problem with bounds, we present a fast active-set-type method, called block principal pivoting. This method accelerates computation by allowi...
متن کاملRegularized Off-Policy TD-Learning
We present a novel l1 regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying ROTD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-point formulation of non-smooth convex optimization, w...
متن کاملMultiplicative Updates for L1-Regularized Linear and Logistic Regression
Multiplicative update rules have proven useful in many areas of machine learning. Simple to implement, guaranteed to converge, they account in part for the widespread popularity of algorithms such as nonnegative matrix factorization and Expectation-Maximization. In this paper, we show how to derive multiplicative updates for problems in L1-regularized linear and logistic regression. For L1–regu...
متن کامل